8 research outputs found

    Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus

    Get PDF
    This paper describes and proposes an efficient and effective framework for the design and development of a speaker-independent continuous automatic Arabic speech recognition system based on a phonetically rich and balanced speech corpus. The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing the three major regions (Levant, Gulf, and Africa) in the Arab world. The proposed Arabic speech recognition system is based on the Carnegie Mellon University (CMU) Sphinx tools, and the Cambridge HTK tools were also used at some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 7 hours of training speech data, the acoustic model is best using continuous observationโ€™s probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The language model contains both bi-grams and tri-grams. For similar speakers but different sentences, the system obtained a word recognition accuracy of 92.67% and 93.88% and a Word Error Rate (WER) of 11.27% and 10.07% with and without diacritical marks respectively. For different speakers with similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29% and a WER of 5.78% and 5.45% with and without diacritical marks respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23% and a WER of 15.59% and 14.44% with and without diacritical marks respectively

    Implicit aspect extraction in sentiment analysis: Review, taxonomy, oppportunities, and open challenges

    No full text
    Sentiment analysis is a text classification branch, which is defined as the process of extracting sentiment terms (i.e. feature/aspect, or opinion) and determining their opinion semantic orientation. At aspect level, aspect extraction is the core task for sentiment analysis which can either be implicit or explicit aspects. The growth of sentiment analysis has resulted in the emergence of various techniques for both explicit and implicit aspect extraction. However, majority of the research attempts targeted explicit aspect extraction, which indicates that there is a lack of research on implicit aspect extraction. This research provides a review of implicit aspect/features extraction techniques from different perspectives. The first perspective is making a comparison analysis for the techniques available for implicit term extraction with a brief summary of each technique. The second perspective is classifying and comparing the performance, datasets, language used, and shortcomings of the available techniques. In this study, over 50 articles have been reviewed, however, only 45 articles on implicit aspect extraction that span from 2005 to 2016 were analyzed and discussed. Majority of the researchers on implicit aspects extraction rely heavily on unsupervised methods in their research, which makes about 64% of the 45 articles, followed by supervised methods of about 27%, and lastly semi-supervised of 9%. In addition, 25 articles conducted the research work solely on product reviews, and 5 articles conducted their research work using product reviews jointly with other types of data, which makes product review datasets the most frequently used data type compared to other types. Furthermore, research on implicit aspect features extraction has focused on English and Chinese languages compared to other languages. Finally, this review also provides recommendations for future research directions and open problems

    Automatic person identification system using handwritten signatures

    No full text
    This paper reports the design, implementation, and evaluation of a research work for developing an automatic person identification system using hand signatures biometric. The developed automatic person identification system mainly used toolboxes provided by MATLAB environment. . In order to train and test the developed automatic person identification system, an in-house hand signatures database is created, which contains hand signatures of 100 persons (50 males and 50 females) each of which is repeated 30 times. Therefore, a total of 3000 hand signatures are collected. The collected hand signatures have gone through pre-processing steps such as producing a digitized version of the signatures using a scanner, converting input images type to a standard binary images type, cropping, normalizing images size, and reshaping in order to produce a ready-to-use hand signatures database for training and testing the automatic person identification system. Global features such as signature height, image area, pure width, and pure height are then selected to be used in the system, which reflect information about the structure of the hand signature image. For features training and classification, the Multi-Layer Perceptron (MLP) architecture of Artificial Neural Network (ANN) is used. This paper also investigates the effect of the personsโ€™ gender on the overall performance of the system. For performance optimization, the effect of modifying values of basic parameters in ANN such as the number of hidden neurons and the number of epochs are investigated in this work. The handwritten signature data collected from male persons outperformed those collected from the female persons, whereby the system obtained average recognition rates of 76.20% and74.20% for male and female persons, respectively. Overall, the handwritten signatures based system obtained an average recognition rate of 75.20% for all persons

    Voice based automatic person identification system using vector quantization

    No full text
    This paper presents the design, implementation, and evaluation of a research work for developing an automatic person identification system using voice biometric. The developed automatic person identification system mainly used toolboxes provided by MATLAB environment. To extract features from voice signals, Mel-Frequency Cepstral Coefficients (MFCC) technique was applied producing a set of feature vectors. Subsequently, the system uses the Vector Quantization (VQ) for features training and classification. In order to train and test the developed automatic person identification system, an in-house voice database is created, which contains recordings of 100 personsโ€™ usernames (50 males and 50 females) each of which is repeated 30 times. Therefore, a total of 3000 utterances are collected. This paper also investigates the effect of the personsโ€™ gender on the overall performance of the system. The voice data collected from female persons outperformed those collected from the male persons, whereby the system obtained average recognition rates of 94.20% and 91.00% for female and male persons, respectively. Overall, the voice based system obtained an average recognition rate of 92.60% for all persons

    English digits speech recognition system based on Hidden Markov Models

    No full text
    This paper aims to design and implement English digits speech recognition system using Matlab (GUI). This work was based on the Hidden Markov Model (HMM), which provides a highly reliable way for recognizing speech. The system is able to recognize the speech waveform by translating the speech waveform into a set of feature vectors using Mel Frequency Cepstral Coefficients (MFCC) technique This paper focuses on all English digits from (Zero through Nine), which is based on isolated words structure. Two modules were developed, namely the isolated words speech recognition and the continuous speech recognition. Both modules were tested in both clean and noisy environments and showed a successful recognition rates. In clean environment and isolated words speech recognition module, the multi-speaker mode achieved 99.5% whereas the speaker-independent mode achieved 79.5%. In clean environment and continuous speech recognition module, the multi-speaker mode achieved 72.5% whereas the speaker-independent mode achieved 56.25%. However in noisy environment and isolated words speech recognition module, the multi-speaker mode achieved 88% whereas the speaker-independent mode achieved 67%. In noisy environment and continuous speech recognition module, the multi-speaker mode achieved 82.5% whereas the speaker-independent mode achieved 76.67%. These recognition rates are relatively successful if compared to similar systems

    Phonetically rich and balanced speech corpus for Arabic speaker-independent continuous automatic speech recognition systems

    No full text
    This paper describes an efficient framework for designing and developing Arabic speaker-independent continuous automatic speech recognition systems based on a phonetically rich and balanced speech corpus. The speech corpus contains 415 sentences recorded by 42 (21 male and 21 female) Arabic native speakers from 11 Arab countries representing three major regions (Levant, Gulf, and Africa). The developed system is based on the Carnegie Mellon University (CMU) Sphinx tools. The Cambridge HTK tools were also used in some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of 4.07 hours of training speech data, the acoustic model used continuous observation's probability model of 16 Gaussian mixture distributions and the state distributions were tied to 400 senons. The language model contains both bi-grams and tri-grams. The system obtained 91.23% and 92.54% correct word recognition with and without diacritical marks respectively

    Fusion of speech and handwritten signatures biometrics for person identification

    No full text
    Automatic person identification (API) using human biometrics is essential and highly demanded compared to traditional API methods, where a person is automatically identified using his/her distinct characteristics including speech, fingerprint, iris, handwritten signatures, and others. The fusion of more than one human biometric produces bimodal and multimodal API systems that normally outperform single modality systems. This paper presents our work towards fusing speech and handwritten signatures for developing a bimodal API system, where fusion was conducted at the decision level due to the differences in the type and format of the features extracted. A data set is created that contains recordings of usernames and handwritten signatures of 100 persons (50 males and 50 females), where each person recorded his/her username 30 times and provided his/her handwritten signature 30 times. Consequently, a total of 3000 utterances and 3000 handwritten signatures were collected. The speech API used Mel-Frequency Cepstral Coefficients (MFCC) technique for features extraction and Vector Quantization (VQ) for features training and classification. On the other hand, the handwritten signatures API used global features for reflecting the structure of the hand signature image such as image area, pure height, pure width and signature height and the Multi-Layer Perceptron (MLP) architecture of Artificial Neural Network for features training and classification. Once the best matches for both the speech and the handwritten signatures API are produced, the fusion process takes place at decision level. It computes the difference between the two best matches for each modality and selects the modality of the maximum difference. Based on our experimental results, the bimodal API obtained an average recognition rate of 96.40%, whereas the speech API and the handwritten signatures API obtained average recognition rates of 92.60% and 75.20%, respectively. Therefore, the bimodal API system is able to outperform other single modality API systems

    Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems

    No full text
    This paper presents our work towards developing a new speech corpus for Modern Standard Arabic (MSA), which can be used for implementing and evaluating Arabic speaker-independent, large vocabulary, automatic, and continuous speech recognition systems. The speech corpus was recorded by 40 (20 male and 20 female) Arabic native speakers from 11 countries representing three major regions (Levant, Gulf, and Africa). Three development phases were conducted based on the size of training data, Gaussian mixture distributions, and tied states (senones). Based on our third development phase using 11 hours of training speech data, the acoustic model is composed of 16 Gaussian mixture distributions and the state distributions tied to 300 senones. Using three different data sets, the third development phase obtained 94.32% and 8.10% average word recognition correctness rate and average Word Error Rate (WER), respectively, for same speakers with different sentences (testing sentences). For different speakers with same sentences (training sentences), this work obtained 98.10% and 2.67% average word recognition correctness rate and average WER, respectively, whereas for different speakers with different sentences (testing sentences) this work obtained 93.73% and 8.75% average word recognition correctness rate and average WER, respectively
    corecore